Wrapper-based deep feature optimization for activity recognition in the wearable sensor networks of healthcare systems

The Human Activity Recognition (HAR) problem leverages pattern recognition to classify physical human activities as they are captured by several sensor modalities. Remote monitoring of an individual’s activities has gained importance due to the reduction in travel and physical activities during the pandemic. Research on HAR enables one person to either remotely monitor or recognize another person’s activity via the ubiquitous mobile device or by using sensor-based Internet of Things (IoT). Our proposed work focuses on the accurate classification of daily human activities from both accelerometer and gyroscope sensor data after converting into spectrogram images. The feature extraction process follows by leveraging the pre-trained weights of two popular and efficient transfer learning convolutional neural network models. Finally, a wrapper-based feature selection method has been employed for selecting the optimal feature subset that both reduces the training time and improves the final classification performance. The proposed HAR model has been tested on the three benchmark datasets namely, HARTH, KU-HAR and HuGaDB and has achieved 88.89%, 97.97% and 93.82% respectively on these datasets. It is to be noted that the proposed HAR model achieves an improvement of about 21%, 20% and 6% in the overall classification accuracies while utilizing only 52%, 45% and 60% of the original feature set for HuGaDB, KU-HAR and HARTH datasets respectively. This proves the effectiveness of our proposed wrapper-based feature selection HAR methodology.


Scientific Reports
| (2023) 13:965 | https://doi.org/10.1038/s41598-022-27192-w www.nature.com/scientificreports/ Limitations of previous works. Current research on HAR is focused on deep learning and machine learning approaches because of its improved accuracy, robustness and speed compared to traditional techniques 4,5 . Several distinct deep learning approaches have emerged over the last few decades, which introduces on various parts of the HAR pipeline. Present-day state of-the-art machine learning models require an adequately large dataset for training them in order to produce accurate results. A number of diverse datasets have been published to overcome this challenge. Previous studies have investigated the problems of HAR using popular machine learning algorithms such as Random Forest (RF), Support Vector Machine (SVM), etc. and deep learning architectures such as Convolutional Neural Network (CNN), Long Short-Term Memory (LSTM), Recurrent Neural Network (RNN) and Artificial Neural Network (ANN) to name a few. Almost all machine learning algorithms face their main difficulty in the time-consuming feature engineering and human feature extraction process. The exigent need for optimized feature selection is the presence of redundant and less significant data among the complete set of available feature space. This increases the computation time for the model to train and the presence of noise reduces the performance. However, feature selection is not synonymous with dimensionality reduction since the former does not alter the attributes of the feature space but just removes them to reduce model complexity. Various research fields have seen the application of feature selection to solve numerous problems such as COVID-19 detection 6 , Indian spoken language identification 7,8 and speech emotion recognition 9 .
The robustness and accuracy that feature selection provides to select deep learning features is leveraged to identify human activities, which has seen little application of feature selection. The main advantages of feature selection are: • Reduces the possibility for the model to train noise and redundant features.
• Improves accuracy of the model by eliminating less important data.
• The machine learning or deep learning models can train faster since they can train fewer number of datasets.
In the context of supervised learning, feature selection can be divided into the following broad categories: • Filter method: The link between each input variable and the target variable is assessed using statistical techniques using filter feature selection methods, and the scores obtained are then used to choose (filter) the input variables that will be incorporated in the model. incorporated the benefits of both filter and wrapper approaches. These are quick processing techniques that are comparable to the filter approach but are more precise.
The goal of detecting various human activities is better suited to deep learning techniques because they automatically train the features from the images or data. In this paper, two popular transfer learning models have been chosen for feature extraction due to the robustness and advancement of deep learning. Transfer learning is a type of supervised learning, where the input remains the same but the target output can be of some other nature. As transfer learning model frameworks can yield a certain set of features, some of which may be identified to be redundant 10 using Binary Bat Algorithm (BBA). We have proposed a hybrid model using a combination of transfer learning models 11 and a wrapper optimizer algorithm based on BBA, to build an efficient model in order to identify human activities. The schematic diagram of our proposed work is also shown in Fig. 1. The entire framework can be further divided into three sub-modules.
1. Conversion of both accelerometer and gyroscope sensor data into spectrogram 12 images. 2. Extraction of the deep features using two pre-trained transfer learning models. 3. Selection of optimal features using wrapper-based optimization method using BBA 13 and classification of each activity by the optimal feature subset.
We have experimented on three standard and recently developed HAR datasets as benchmark, which are: Human Activity Recognition Trondheim (HARTH) 14 , Khulna University-HAR (KU-HAR) 15 and Human Gait DataBase (HuGaDB) 16 using the proposed framework and the same has been mentioned in later sections of this paper. This paper is structurally arranged as follows: A brief literature review to highlight prior efforts on sensorbased data for the HAR problem is presented in Literature Analysis, whereas the architectural details of our proposed model along with the dataset description are presented in Materials and Methods. Experimental Results Section provides the experimental findings attained by the proposed HAR framework. Finally, the comprehensive summary of our work and future research are concluded in Conclusions & future works.
Motivation of proposed work. Researchers used several machine learning and CNN-based models in their work, as was discovered after examining the prior research on these datasets. For all three HAR datasets, no generic model was applied. The sensor-based HAR problem has been transformed into an image classification problem using a robust model that first transforms the sensor data into the corresponding spectrogram images. As per our knowledge, this type of transformation (from raw sensor data to spectrogram images) has been done • For feature extraction from the spectrogram images, Efficientnet_b0 and Mobilenet_v3_Large, two wellknown transfer learning-based models, are applied. • The best optimal feature subset from the retrieved feature set is chosen using the BBA. Finally, the classification of human activities has been carried out using the optimal feature subset.
The contributions of the proposed research are as follows: 1. The proposed work transforms the raw sensor data into its corresponding spectrogram images, which is done for the first time in the domain of sensor-based HAR. 2. The compact input images are fed as input to both Efficientnet_b0 and Mobilenet_v3_Large architectures for deep feature extraction from the penultimate layer of the networks since, due to the less amount of available data, an end-to-end classification scheme using CNN models fail. 3. The extracted features from the two pre-trained transfer learning models (trained with two different compact inputs) have been concatenated to form a final feature space, which is then fed into the BBA for the selection of an optimal feature subset. 4. The final classification on the optimal feature set is done by using K-nearest neighbor (KNN) classifier, thus achieving commendable results on the three publicly available HAR datasets that have been used to evaluate the proposed sensor-based HAR model.

Literature analysis
HAR is one the important and highly researched field of computer vision since its beginning. The aim of HAR system is to identify activities performed by a person. HAR finds application in various areas like health monitoring, gesture based systems, intuitive interfaces for machines (gaming consoles), and surveillance-based security. It has evolved to the point where the recognized method can be generalized and has accurate recognition of human activity as possible 17 . The HAR methodology's complexity is related to the various data inputs that can come from different modalities such as videos, images, audio signals, wearable sensors and other sources. Based on various experiments performed by researchers, many HAR models have been developed in order to improve the overall performance and quality metrics 18 . Several distinct deep learning approaches have emerged over the last few decades, each of which innovates on different areas of the HAR pipeline 19 . Generally, deep learning methodologies emphasises on features whereas Reinforcement learning emphasises on feedback. On the other hand, traditional machine learning focuses on reaping the benefits of planting fruits and beans, whereas transfer learning can draw inferences from both. Recently, deep learning models are being used in various research areas of computer vision as well as pattern recognition domains. For solving HAR problem, Mukherjee et al. 20 have developed an ensemble model named as EnsemConvNet in their work and achieved recognition accuracy of about 97% on WISDM dataset. Furthermore, Das et al. 21 proposes a multi-modal HAR model called MMHAR-EnsemNet and able to achieve accuracy of about 99% on both UTD-MHAD and Berkeley-MHAD datasets. Bhattacharya el al. 22 proposed a deep learning model, named as SV-NET, in order to recognize the human activities from video images. Banerjee et al. 23 proposed a fuzzy integral-based CNN classifier model for skeleton-based HAR problem. Additionally, different applications of CNN models for solving image classification problems proposed by Bhattacharya et al. 24 , Chattopadhyay et al. 25,26 . Channel equalization and channel selectivity in CNNs have been employed for the   34 . The recent survey confirmed this, revealing that only 30 out of 142 accelerometer-based datasets were publicly available 35 . The authors trained the dataset by using state-of-the-art machine learning models and were able to achieve the best F1-score of 81% by using a SVM classifier. Like the HARTH dataset, we have selected the KU-HAR dataset which got recently developed in March 2021 by Niloy Sikder et al. 15 . Their main aim was to introduce a new smartphone sensor data (based on both gyroscope and accelerometer) with new activity classes that will assist researchers in developing more delicate models for designing a real-world HAR framework. They achieved nearly 90% accuracy by using RF classifier. Apart from the HARTH and KU-HAR datasets to test the robust behaviour of our proposed model, we have selected the standard HuGaDB dataset which was introduced by Roman Chereshnev et al. 16 in July 2017. This is a human gait data collection made freely available in UCI Machine Learning repository. Gochoo et al. 36 applied hierarchical feature-based technique to extract the feature which is then optimized using Stochastic gradient descent (SGD) technique and attained 92.50% accuracy. A hybrid feature selection model using deep belief networks was proposed by Madiha Javeed et al. 37 and able to achieve 92.5% accuracy on HuGaDB dataset. Bin Fang el al. 38 applied CNN-based model and achieved 79.24% accuracy. Yingnan Sun et al. 39 proposed a novel ANN based classification model for real-time gait analysis and achieved 88% accuracy. Girja Kumari et al. 40 used LSTM based deep learning classifier model in their proposed work and achieved 91.1% accuracy on HuGaDB dataset. Several researchers used the same dataset for their research. A brief summarization of existing HAR models, developed in the literature, for HARTH, KU-HAR and HuGaDB datasets are included in Table 1.

Materials and methods
Dataset description. In our proposed work, we have experimented with three publicly available HAR datasets: 1. HARTH 14 dataset 2. KU-HAR 15 dataset 3. HuGaDB 16 dataset The detailed discussion related to the above-mentioned HAR datasets is described below.
HARTH dataset. The HARTH dataset is accessible to the general public. A total of 22 individuals who wore two three-accelerometers on their lower back, and thigh provided the acceleration data for this report. The HARTH dataset has 12 human activities or classification labels. Activities and their corresponding IDs are described in Table 2 whereas the number of samples per activity class present in HARTH dataset are shown in Fig. 2.  Table 3 whereas the number of samples per activity class present in KU-HAR dataset are shown in Fig. 3.
HuGaDB Dataset. This dataset includes continual recordings of a variety of activities, including standing up, walking, and utilizing the stairs, etc.. The data was gathered using a six-wearable body sensor system, which included inertial sensors placed on the thighs, feet, and right and left shins. Two EMG sensors were also attached to the quadriceps to track muscle activation as well. Activities and their corresponding IDs are described in Table 4 whereas the number of samples per activity class present in HuGaDB dataset are shown in Fig. 4.        11 to final classification using wrapper-based deep feature selection algorithm has been given in Fig. 5. The entire framework can be subdivided into three phases: 1. Data Pre-processing: Transform the raw sensor time-series data into spectrogram images. 2. Feature Extraction: Extract deep features using two popular transfer learning model (Mobilenet_v3_Large and Efficientnet_b0) followed by concatenation of extracted features. 3. Feature Selection: Application of wrapper-based deep feature selection using BBA and classification of the target activities performed.
Data pre-processing. We employ two widely used transfer learning models that need images for training; as a consequence, raw sensor data are transformed into spectrogram images. As mentioned earlier, three recently developed HAR datasets such as HARTH, KU-HAR, and HuGaDB are being used in this work. The two sensor values (acceleration and gyroscopic) are present in all three HAR datasets. First, all sensor readings have been gathered activity-wise in the various data arrays using the raw data that has been received from the file. Both the generation of spectrogram images as well as activity-wise data arrays need the division of data arrays into numerous data frames (300 rows, in our case). The dataset's smallest activity data length is calculated among the activities that are regarded to have the most rows overall during this splitting. The prior step is crucial for ensuring a balanced distribution of samples among each activity class. The purpose of this step is to encode a spectrogram from time series data using the Pylab 42 Python package. Figures 6, 7 and 8 show samples of spectrograms generated for each human activities from the HARTH, KU-HAR and HuGaDB datasets respectively.
Feature extraction. Spectrogram images generated in the previous phase are fed to the Mobilenet_v3_Large and Efficientnet_b0 CNN transfer learning models. Following this, the extracted features are saved for each model separately for building the further phases of the total framework. Transfer learning models are trained on millions of image data samples as a result of which they can classify image or similar data with high performance. Hence, the sensor data modalities are converted to spectrogram images to leverage the pre-trained weights of the transfer learning CNN models.
Efficientnet. The new model, termed Efficientnet, has been introduced by Google 33 with the primary goal of introducing one that is more effective than the current state-of-the-art results. Usually, the models are made with an excessively high resolution, or they are overly deep and wide. The Google study describes a scaling strategy called compound scaling, which claims that intentionally scaling all three model attributes-depth, breadth, and resolution-at the same time produces better results than scaling any one of them alone. The fused-MB convolu-   www.nature.com/scientificreports/ tional layers are typical of the EfficientnetV2, which can employ current GPU/CPU accelerators and has fewer parameters and FLOPS. The original Efficientnet model has eight different blocks from b0 to b7 as shown in Fig. 9. Any network's stem is its primary component. From there, all experiments start with the architecture of the network, which is a trait shared by all eight models, and then with the network's final layers. Each of them has seven blocks after that. To train our spectrograms, we use the Efficientnet_b0 base model, which comprises seven inverted residual blocks, each with a different configuration. These blocks also incorporate excitation, squeeze, and swish activation. Each of them comprises seven blocks after that. In addition, each of these blocks has a different number of sub-blocks, whose total number rises from Efficientnet_b0 to Efficientnet_b7.
Mobilenet_v3. Mobilenet 43 is a type of convolutional neural network which is built for embedded vision systems and mobile applications. It is based on a streamlined architecture that employs depth-wise separable convolutions to construct lightweight deep neural networks with low latency for embedded and mobile devices. To cut down on the number of parameters, a depth-wise convolution has been implemented 44 . The procedure has been equally divided into two parts 45 ,

The depth-wise convolution 2. The point-wise convolution
Every channel's image input of shape (256 × 256) is filtered by depth-wise convolution then the point-wise convolution is put in a 1 × 1 convolution to integrate the output results of the previous layer. This results in a significant reduction in both model complexity as well as computational power. To improve the Mobilenet architecture, a upgraded version was introduced in 2019 named as Mobilenet_v3. and it was built by removing complex layers with employing the H-swish function rather than standard ReLU. Figure 10 summarizes the architecture of Mobilenet_v3 model used in the present work. Mobilenet_v3 is defined as two models: Mobilenet_v3_Large and Mobilenet_v3_Small. In terms of both performance and accuracy, Mobilenet_v3_Large model gives better result. In the present work, we have selected Mobilenet_v3_Large model to train our model.
Selection of optimal features using BBA. In our proposed paper we have utilized the wrapper function over a filter based methodology for the selection of features due to the following reasons:  www.nature.com/scientificreports/ • The filter based methods use statistical methods hence there is no interaction with the model for the optimal selection of feature subsets. Contrary to this, the wrapper methodologies interact with the models by comparing every conceivable feature combination to the evaluation criterion by employing a greedy search methodology. • While filter selection takes into account each feature separately, wrapper-based approaches take into account the dependencies between the features throughout the entire feature space, which results in improved computational performance.
Following the extraction of both model features of tensor size 1000, which is equal to the final linear layer of the two transfer learninig moedels, a single feature vector is created of tensor size 2000, which is then fed to the BBA 46 wrapper function to select the best optimal feature subset followed by the final classification of human activities using a k-nearest neighbours classifier. The KNN algorithm 47 is a supervised learning classifier that employs proximity to produce classifications or predictions about the grouping of a single feature point. Although it can be applied to classification or regression issues, it is commonly employed as a classification algorithm because it relies on the idea that comparable points can be discovered close to one another. The details of our experimental outcomes are included in the Results section.
Binary Bat Algorithm (BBA). Yang et al. 's 48 Bat Algorithm (BA), a novel meta-heuristic approach for continuous optimization, was based on the remarkable ability of microbats to detect their food and differentiate between various bug species-even in total darkness. Such an approach has shown to be more successful than several well-known optimization techniques drawn from nature. The basic BA flowchart is shown in Fig. 11. As per BA, 1. Each bat randomly assigned a frequency between [ fe min , fe max ]. 2. Every bat is associated with velocity ( V t j ) and position P t j in search space at each iteration t , with respect to frequency fe j . 3. Hence, at each iteration, we need to update fe j , V j and P j as per following equations.
where, β is random value which lies between [0, 1] and P * is current best.
Later, BBA was introduced by Nakamura et al. 48 . They changed the equation position of the basic BA and replaced with binary vectors by using one of the transfer functions, and otherwise it is structurally similar to the basic BA. The optimal result has been selected among the 2n possibilities.
Therefore, Eq. (3) of BBA can be replaced with Eq. (4), which is as follows:  Table 5 shows how the number of features decreases with the use of the BBA, which chooses the most pertinent features from all those obtained from both the Mobilenet_ v3 Large and Efficientnet_b0 models. If we examine the reduction of concatenated features by dataset, it can be found that almost 52% of HuGaDB dataset, 45% of KU-HAR dataset, and 60% of HARTH dataset has been optimally selected by the BBA. Furthermore, the overall classification accuracy of the proposed model has also been improved in case of all the three HAR datasets. Table 6 illustrates the increment in the overall classifica-    Table 6 that increments of about 20%, 21% and 6% in the overall classification accuracies have been noted for HARTH, KU-HAR and HuGaDB datasets respectively which is quite impressive. The proposed HAR model has also been assessed using other common performance measures such as F1-score, Recall, and Precision, which are presented in Table 7. It can be noticed from Table 7 that the application of wrapper-based feature selection using BBA on the concatenated feature vector produces Precision, Recall and F1-score of 0.90, 0.89 and 0.89 respectively for HARTH dataset, 0.97, 0.96 and 0.96 respectively for KU-HAR dataset, and 0.94, 0.93 and 0.93 respectively for HuGaDB dataset.
Loss plot. The loss plot gives us a summary of the training process and the network's learning process. The loss function is used to determine the quantitative loss measure at the designated epoch across all data items throughout the course of an epoch. Figures 12, 13, and 14 show the train and validation loss plots of the combination of each model performed on the HARTH, KU-HAR, and HuGaDB datasets respectively. The models are trained using each HAR datasets for 40 epochs, as is already indicated. Figures 12, 13, and 14 show that the loss values decreases with respect to epoch, indicating that the models are picking up on the input item and increasing the prediction probability. Figures 12 and 13 show that the training loss curve converges after 35 epochs for both HARTH, KU-HAR datasets, whereas Figure 14 shows that the training loss curve for the Efficientnet_b0 model converges after 35 epochs whereas in case of Mobilenet_v3 model, the loss converges after 20 epochs.

Analysis of confusion matrix.
A snippet of the classification problem's prediction outcomes is called a confusion matrix. Each activity class's share of successful and failed predictions is indicated with count values. Figure 15 illustrates how the magnitude of the confusion relies on the number of activity classes contained in the HAR dataset. The confusion matrix for HARTH and HuGaDB datasets is 12 by 12, whereas the confusion matrix for KU-HAR dataset is 18 by 18. Each row in the confusion matrix stands for an anticipated activity class. Every column in the confusion matrix corresponds to a unique class. The diagonal numbers indicate how frequently the samples are successfully categorised. The samples that the model is unable to correctly categorise are the numbers that are not on the diagonal. According to the confusion matrix (Fig. 15a) for the HARTH dataset, it can be seen that two instances of 'Running' activity class are mistaken for 'Walking' and 'Climbing stairs' classes, which are both continuous activities that involve a person moving from one place to another. Other activity classes, on the other hand, are correctly classified. Out of 52 samples in the KU-HAR dataset, the confusion matrix (shown in Fig. 15b) illustrates that five times the 'Pick' activity class which contain vertical acceleration, are incorrectly classified as 'Talk-Sit' , 'Stand-Sit' , and 'Run' activity classes. The other activity classes in the KU-HAR dataset are all accurately categorised.    Comparison with existing models. Comparison with other optimization algorithms. Utilizing five well-known metaheuristic optimization algorithms, including the Cuckoo Search Algorithm (CSA) 50 , Equilibrium Optimizer (EO) 51 , Genetic Algorithm (GA) 52 , Gravitational Search Algorithm (GSA) 53 , and Grey Wolf Optimizer (GWO) 54 , the proposed BBA-based deep feature optimization model has been tested in the current work. Table 9 enlists the performance results achieved by applying all the above meta-heuristic optimization algorithms. It can be noticed from  www.nature.com/scientificreports/ Table 9 that for both the HuGaDB and KU-HAR datasets, the proposed wrapper-based BBA has given the best classification accuracy with the least number of selected optimal features. However, in case of HARTH dataset, the CSA selects only 1 feature less than as compared to the proposed BBA but despite that our proposed model produces an accuracy gain of approximately 2%. This justifies the trade-off between classification accuracy and the number of optimal features selected which is negligible compared to the former. Following a comparison of the number of selected optimal features with the corresponding classification accuracies, it can be said that the BBA is one of the best-suited wrapper algorithm for each of the three HAR datasets.

Statistical significance test: McNemar's test.
In the preceding section, we thoroughly examined the performance of our proposed model on three HAR datasets and found that the proposed framework of the two base models outperforms each of them in terms of accuracy. To specifically show the effectiveness of our recommended Wrapper based classification model over the basic models, we ran a statistical significance test 55 known as the McNemar's test 56 .
The results of McNemar's test performed on the three HAR datasets-HARTH, KU-HAR, and HuGaDB datasets are presented in Table 10. McNemar's test's p-value should ideally be below 5% in order to reject the null hypothesis 57 , and Table 10 clearly shows that this is the case in every scenario where the p value is less than 0.05. www.nature.com/scientificreports/ Therefore, in every instance, the null hypothesis is rejected. As may be deduced from the aforementioned statistical tests, the findings obtained by the base models and suggested ensemble model are statistically significant, i.e., not equal. This explains why the proposed HAR framework integrates the supplementary details supplied by the individual classifiers and produces better predictions, setting the overall wrapper-based feature optimization model apart from all of the individual transfer learning models.

Conclusions & future works
HAR is still one of the challenging research areas for the research community due to the dynamic nature of the data since there is no predefined number of human activities list. Three distinct HAR datasets, namely, HARTH containing 12 human activities, KU-HAR containing 18 human activities, and HuGaDB containing 12 human activities, have been utilized in this study. It is to be noted that all the three HAR datasets, used in the present work, have a different collection of human activities. Developing distinct techniques for each HAR datasets is not feasible. Therefore, the goal is to create a generic HAR model that applies to any acceleration and gyroscope based sensor data. Here, a combination of two distinct kinds of transfer learning models along with a wrapperbased feature selection approach are used to achieve the same goal. The three recently developed benchmark HAR datasets namely, HARTH, KU-HAR, and HuGaDB, are used to test the proposed model, and the classification accuracies are found to be 88.89%, 97.9%, and 93.82%, respectively. The experimental results performed better than most recent state-of-the-art findings. It is to be noted that almost 52%, 45% and 60% of original number of deep features for HuGaDB, KU-HAR and HARTH datasets respectively, have been selected by the present wrapper-based feature selection method using BBA. On the other hand, a total of about 21%, 20% and 6% improvement in the overall classification accuracies have been attained by the proposed wrapper-based deep feature optimization methodology on HuGaDB, KU-HAR and HARTH datasets respectively. This is one of the major advantages of our proposed HAR framework. The proposed model can be tuned in future by applying the following approaches: • Filter method can be used to rank the features and instead of sending all features to the wrapper, they can be sent ranked-wise. • The proposed model can be integrated with different type IoT and smartphone devices for real-time HAR prediction. • The proposed HAR model can be validated on other publicly available datasets available in literature.
The primary contribution of this work is the proposed two-fold architecture model, which outperforms the prior state-of-the-art HAR models proposed in the literature. It offers a strong basis for combining a transfer learning deep feature extraction model with a wrapper-based feature selection approach that has already been trained for HAR. In the coming future, it may be integrated with many IoT and smartphone platforms, which is advantageous in a variety of fields like mobile app development, healthcare, and many more.

Data availability
No datasets are generated during the current study. The datasets analyzed during this work are made publicly available in this published article.